Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Algorithms for Binary Neural Networks

FIGURE 3.23

We demonstrate the kernel weight distribution of the ﬁrst binarized convolution layer of

BONNs. Before training, we initialize the kernels as a single-mode Gaussian distribution.

From the 2-th epoch to the 200-th epoch, with λ ﬁxed to 1e −4, the distribution of the

kernel weights becomes more and more compact with two modes, which conﬁrms that the

Bayesian kernel loss can regularize the kernels into a promising distribution for binarization.

two-mode GMM style. Figure 3.25 shows the evolution of the binarized values during the

training process of XNOR-Net and BONN. The two diﬀerent patterns indicate that the

binarized values learned in BONN are more diverse.

Eﬀectiveness of Bayesian Feature Loss on Real-Valued Models: We apply our

Bayesian feature loss on real-value models, including ResNet-18 and ResNet-50 [84]. We

retrain these two backbones with our Bayesian feature loss for 70 epochs. We set the hy-

perparameter θ to 1e −3. The SGD optimizer has an initial learning rate set to 0.1. We use

FIGURE 3.24

The weight distributions of XNOR and BONN are based on WRN-22 (2nd, 8th, and 14th

convolutional layers) after 200 epochs. The weight distribution diﬀerence between XNOR

and BONN indicates that the kernels are regularized across the convolutional layers with

the proposed Bayesian kernel loss.